An Adaptable Acoustic Architecture in a Multilingual TTS System
نویسندگان
چکیده
In this paper an adaptable acoustical architecture in a multilingual TTS system is presented. The whole architecture is designed to be a data-driven system. Modules comprising text preprocessing, grapheme-to-phoneme conversion, lexical stress detection, OOV-handling, symbolic prosody prediction, acoustic prosody prediction and unit selection with concatenation use machine learning techniques especially neural networks (NN) or language independent routines. The adaptable and scaleable architecture of the acoustic prosody generation module is built up by four sub-modules. While duration control uses a NN designed on the modified causal error correction architecture (CRCECNN), f0-generation utilizes a MLP NN. Within both NN modeling a partially Weight Decay (p-WD) method is applied to optimize each input vector dimension of the NNs. The p-WD method helps to select one of the highly correlated features in contrast to standard weight decay; hence through its penalty function we achieved a minimized input feature set. By the use of the third sub-module, which reuses the predictions of the optimized NNs, a hybrid architecture is established, as unit selection based on syllable prosody parameter criterions combines prosody selection with unit selection. Handling with a limited database makes a post processing unit necessary. We’ll emphasize the problem of finding optimal speech segments and an approach of segment selection using a global parameterized non-linear suitability function in combination with a modified multi-level Viterbi search algorithm. Preliminary acoustic ratings of the adapted TTS system to Slovenian language will be introduced.
منابع مشابه
Recent Advances in Multilingual Text-to-speech Synthesis
In this paper we will discuss recent advances in multilingual text-to-speech (TTS) synthesis research at AT&T Bell Laboratories. The TTS system developed at AT&T Bell Laboratories generates synthetic speech by concatenating segments of natural speech. The architecture of the system is designed as a modular pipeline where each module handles one particular step in the process of converting text ...
متن کاملA flexible multilingual TTS development and speech research tool
Diverse synthesis methods and text-to-speech (TTS) architectures are being developed and applied almost every day. This tendency raises the need for durable program systems that effectively assist research and development in this area. A flexible development system for multilingual textto-speech and general speech research is introduced. The system was developed for use with the Multivox and Pr...
متن کاملA brief outline of Aculab TTS: Multilingual TTS for computer telephony
The requirements of the computer telephony (CT) industry place conflicting demands on text-to-speech (TTS) systems. Multilingual functionality and high quality output at telephone bandwidth requires detailed linguistic and acoustic analysis. At the same time, the need for robustness together with a high channel count and small memory footprint means that systems must be extremely efficient and ...
متن کاملMultilingual TTS for computer telephony: the aculab approach
The requirements of the computer telephony (CT) industry place conflicting demands on text-to-speech (TTS) systems. Multilingual functionality and high quality output at telephone bandwidth requires detailed linguistic and acoustic analysis. At the same time, the need for robustness together with a high channel count and small memory footprint means that systems must be extremely efficient and ...
متن کاملWeb-based Architecture RES based on finite-state machines for distributed evaluation and development of speech synthesis systems
This paper proposes flexible and multi-purpose web-based distributed architecture for a multilingual text-tospeech synthesis system (TTS). The proposed architecture is complex client/server architecture, composed of several modules implemented as finite-state engines and located by different users worldwide. Data exchange (text/audio) between modules is implemented through the use of protocol s...
متن کامل